A two-dimensional mid-infrared optoelectronic retina enabling simultaneous perception and encoding

Infrared machine vision system for object perception and recognition is becoming increasingly important in the Internet of Things era. However, the current system suffers from bulkiness and inefficiency as compared to the human retina with the intelligent and compact neural architecture. Here, we present a retina-inspired mid-infrared (MIR) optoelectronic device based on a two-dimensional (2D) heterostructure for simultaneous data perception and encoding. A single device can perceive the illumination intensity of a MIR stimulus signal, while encoding the intensity into a spike train based on a rate encoding algorithm for subsequent neuromorphic computing with the assistance of an all-optical excitation mechanism, a stochastic near-infrared (NIR) sampling terminal. The device features wide dynamic working range, high encoding precision, and flexible adaption ability to the MIR intensity. Moreover, an inference accuracy more than 96% to MIR MNIST data set encoded by the device is achieved using a trained spiking neural network (SNN).

where IDS is the source-drain current, T is temperature, Φ SB is the Schottky barrier Even so, the current at negative bias is lower than that at positive bias.

Supplementary Note 2. Photoresponse characteristics of the b-AsP/MoTe2
heterostructure The laser spots of MIR (4.6 m) laser and NIR (730 nm) laser are about 100 m, which is larger than the size scaling of the as-fabricated 2D b-AsP/MoTe2 heterostructures. Thus, the entire device can be considered to be uniformly illuminated.
As shown in Supplementary Fig. 7a 11 More experiments are performed to support the above points, which are described as follows: First, we measured the Seebeck coefficient of b-AsP by fabricating a thermoelectric device. As shown in Supplementary Fig. 9, by applying a certain voltage to the heater, the heating power is injected into the device, and a temperature gradient (T) across the b-AsP is created. The local temperature at both ends of b-AsP could be read out by the pre-calibrated thermometer-1 and thermometer-2. Simultaneously, the thermoelectric voltage (V) induced by temperature gradient was measured by thermometer-1/2. The thermoelectric voltage (V) is plotted against the temperature gradient (T) with a linear trend, from whose slope we extract the Seebeck coefficient of b-AsP to be S = V/T = 723.66 V/K. In the same way, the Seebeck coefficient of MoTe2 was measured to be 142.59 V/K (Supplementary Fig. 10). In addition, we also evaluated the photo-seebeck coefficient of b-AsP by combing temperature-dependent and power-dependent Raman spectra as well as local illumination induced photovoltage measurements. As shown in Supplementary Fig. 11 Fig. 21). Such a fast and stable response makes it possible to generate higher spiking rates and provides a guarantee for high-precision MIR intensity coding.

Supplementary Note 4. Rules for determining the encoding parameters for high precise encoding.
Supplementary Fig.24 shows the schematic diagram about how to determine the suitable encoding parameters (mean (u), variance (σ) and spiking threshold current (ITC)) to realize high encoding precision. If the linear region of encoding transfer curve is shifted to the middle of the MIR power range of interest, the corresponding parameters can be regarded to be optimal. The rules of setting the encoding parameters are given as follows: First, we need get the function of photocurrent (IDS) as PMIR and PNIR, of which the schematic is shown in Supplementary Fig. 24a. Then, we need determine the largest 2 )/2 (5) that is convenient for us to directly extract it from the IDS curve. For a relatively low PNIR range, the high precise encoding can be realized by properly decreasing u and σ at a fixed ITC. The final encoding transfer curve is given in Supplementary Fig. 24d-e.

Supplementary Note 5. Analytical results of the device and spiking neural network
Response speed to NIR light in our device is a critical metric that determines how fast our device can encode input MIR signal, and how many time-steps in one trail of NIR optical pulses that the devices can afford within a fixed encoding time. The rising time of our device to NIR light is 600 ns, and the falling time is 3.7 s. The Fast response speed allows the sampling period (TS) is reduced to 10 s, by which the encoding performance has already shown in Fig. 2  We also do the simulation that encodes the intensity of a clock image using time-steps of 10 and 100, respectively. The results indicate that more time-steps thanks to high response speed of the device enable highly-precise encoding.
A current response model of our device to PMIR and PNIR can be fitted by measured data. By this model, simulation results of spike rate as a function of PMIR is given in Supplementary Fig. 26a-c. It's observed that a higher σ can extend to cover PMIR range from 0 to 80.21 W/cm 2 . The location of dynamic working range moves with u and ITC.
As shown in Supplementary Fig. 27a-b, more hidden neurons improve the classification ability of SNN traded from the structure complexity. Membrane potential decay rate (β) means the ability of the synaptic neuron memorizing the former information. When β reaches 100%, the accuracy exceeds 95% thanks to ideal memory of neurons, but a real synaptic device hardly has β of 100% without any memorized voltage decay, so the β in this work is set to 95%. The u, σ, ITC and sampling points for two figures are 130 mW/cm 2 , 55 mW/cm 2 , 0 nA and 50. Supplementary Fig. 27c verifies that training parameters and iteration numbers of SNN is sufficient for loss convergence without underfitting problem. Additionally, the loss convergence of test set proves the SNN has no over-fitting problem.
The impact of other distributions for sampling NIR light on encoding precision and recognition accuracy of SNN are investigated in Supplementary Fig. 29 Supplementary Fig. 29. It's found that all distribution can realize encoding functions, but the Gaussian distribution has the highest noise-tolerance with the smallest error in spike rate under the same additive white noise.
Regarding to the recognition accuracy, the results are show in Supplementary Fig.   29d-l, when the encoding parameters (u, σ, R) are optimized, three distributions have the similar highest recognition accuracy up to 96.7% for high-Pmax objects. However, the Uniform distribution has the highest recognition accuracy for the objects with Pmax lower than 20 W/cm 2 followed by Gaussian distribution and Laplace distribution. This is because the SNN is trained by ideal linear encoding. The linearity of encoding transfer curve using Uniform distribution is better than that of other two distributions, and therefore the Uniform distribution matches better to the trained SNN even its encoding precision is lower than Gaussian distribution. Nevertheless, it's noted that the Uniform distribution poses a big challenge on the resolution of output optical power for the NIR laser if time steps increase. Considering the resolution of the output optical power for our laser is only 0.1 mW that limits the use of Uniform distribution in a relatively low encoding range, we choose Gaussian distribution in the main text to demonstrate our concept and the functionality of our device despite of a little performance sacrifice on the digit recognition task.
The impact of different devices thicknesses and the different wavelengths of stochastic light sources are investigated in Supplementary Fig. 30-31. The results reveal that the devices at different thicknesses and different wavelengths of stochastic light source can possess similar encoding and recognition accuracy if encoding parameters including u, σ and ITC are optimized.